character-level model
- Asia > Middle East > Jordan (0.05)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
The paper presents a character-level convolutional network architecture and applies it to eight text classification problems on large datasets that the authors construct. It also presents comparative results from several word-based deep NN models as well as bag-of-ngrams models. The character-level ConvNets outperform word-based models on four out of eight datasets, when word-based data augmentation is used. The clarity and quality of writing are ok but the presentation of the method and results could have been much more clear. There are numerous grammatical and spelling errors.
Character-level NMT and language similarity
We explore the effectiveness of character-level neural machine translation using Transformer architecture for various levels of language similarity and size of the training dataset on translation between Czech and Croatian, German, Hungarian, Slovak, and Spanish. We evaluate the models using automatic MT metrics and show that translation between similar languages benefits from character-level input segmentation, while for less related languages, character-level vanilla Transformer-base often lags behind subword-level segmentation. We confirm previous findings that it is possible to close the gap by finetuning the already trained subword-level models to character-level.
- Europe > United Kingdom > UK North Sea (0.05)
- Atlantic Ocean > North Atlantic Ocean > North Sea > UK North Sea (0.05)
- Europe > Italy > Tuscany > Florence (0.04)
- (12 more...)
- Research Report (0.50)
- Overview (0.46)
What is the best recipe for character-level encoder-only modelling?
This paper aims to benchmark recent progress in language understanding models that output contextualised representations at the character level. Many such modelling architectures and methods to train those architectures have been proposed, but it is currently unclear what the relative contributions of the architecture vs. the pretraining objective are to final model performance. We explore the design space of such models, comparing architectural innovations and a variety of different pretraining objectives on a suite of evaluation tasks with a fixed training procedure in order to find the currently optimal way to build and train character-level BERT-like models. We find that our best performing character-level model exceeds the performance of a token-based model trained with the same settings on the same data, suggesting that character-level models are ready for more widespread adoption. Unfortunately, the best method to train character-level models still relies on a subword-level tokeniser during pretraining, and final model performance is highly dependent on tokeniser quality. We believe our results demonstrate the readiness of character-level models for multilingual language representation, and encourage NLP practitioners to try them as drop-in replacements for token-based models.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > United Kingdom > England > Greater London > London (0.14)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- (8 more...)
DeepMath - Deep Sequence Models for Premise Selection François Chollet
We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the handengineered features of existing state-of-the-art models. To our knowledge, this is the first time deep learning has been applied to theorem proving on a large scale.
- Asia > Middle East > Jordan (0.05)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (4 more...)
- Instructional Material (0.46)
- Research Report > Promising Solution (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Subword-Delimited Downsampling for Better Character-Level Translation
Edman, Lukas, Toral, Antonio, van Noord, Gertjan
Subword-level models have been the dominant paradigm in NLP. However, character-level models have the benefit of seeing each character individually, providing the model with more detailed information that ultimately could lead to better models. Recent works have shown character-level models to be competitive with subword models, but costly in terms of time and computation. Character-level models with a downsampling component alleviate this, but at the cost of quality, particularly for machine translation. This work analyzes the problems of previous downsampling methods and introduces a novel downsampling method which is informed by subwords. This new downsampling method not only outperforms existing downsampling methods, showing that downsampling characters can be done without sacrificing quality, but also leads to promising performance compared to subword models for translation.
How Good is Your Chatbot? An Introduction to Perplexity in NLP
New, state-of-the-art language models like DeepMind's Gopher, Microsoft's Megatron, and OpenAI's GPT-3 are driving a wave of innovation in NLP. How do you measure the performance of these language models to see how good they are? In a previous post, we gave an overview of different language model evaluation metrics. This post dives more deeply into one of the most popular: a metric known as perplexity. Surge AI delivers better data, faster.
Understanding Perplexity Metrics in Natural Language AI
New, state-of-the-art language models like DeepMind's Gopher, Microsoft's Megatron, and OpenAI's GPT-3 are driving a wave of innovation in NLP. How do you measure the performance of these language models to see how good they are? In a previous post, we gave an overview of different language model evaluation metrics. This post dives more deeply into one of the most popular: a metric known as perplexity. Imagine you're trying to build a chatbot that helps home cooks autocomplete their grocery shopping lists based on popular flavor combinations from social media.
Analyzing the Use of Character-Level Translation with Sparse and Noisy Datasets
Tiedemann, Jörg, Nakov, Preslav
This paper provides an analysis of character-level machine translation models used in pivot-based translation when applied to sparse and noisy datasets, such as crowdsourced movie subtitles. In our experiments, we find that such character-level models cut the number of untranslated words by over 40% and are especially competitive (improvements of 2-3 BLEU points) in the case of limited training data. We explore the impact of character alignment, phrase table filtering, bitext size and the choice of pivot language on translation quality. We further compare cascaded translation models to the use of synthetic training data via multiple pivots, and we find that the latter works significantly better. Finally, we demonstrate that neither word-nor character-BLEU correlate perfectly with human judgments, due to BLEU's sensitivity to length.
- Europe > Czechia > Prague (0.05)
- North America > United States > New York > Monroe County > Rochester (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (23 more...)
Deep Search Query Intent Understanding
Liu, Xiaowei, Guo, Weiwei, Gao, Huiji, Long, Bo
Understanding a user's query intent behind a search is critical for modern search engine success. Accurate query intent prediction allows the search engine to better serve the user's need by rendering results from more relevant categories. This paper aims to provide a comprehensive learning framework for modeling query intent under different stages of a search. We focus on the design for 1) predicting users' intents as they type in queries on-the-fly in typeahead search using character-level models; and 2) accurate word-level intent prediction models for complete queries. Various deep learning components for query text understanding are experimented. Offline evaluation and online A/B test experiments show that the proposed methods are effective in understanding query intent and efficient to scale for online search systems.
- Europe > Ireland > Connaught > County Galway > Galway (0.05)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)